5 research outputs found

    Optimized Indexes for Data Structured Retrieval

    Get PDF
    The aim of this work is to show the novel index structure based suffix array and ternary search tree with rank and select succinct data structure. Suffix arrays were originally developed to reduce memory consumption compared to a suffix tree and ternary search tree combine the time efficiency of digital tries with the space efficiency of binary search trees. Rank of a symbol at a given position equals the number of times the symbol appears in the corresponding prefix of the sequence. Select is the inverse, retrieving the positions of the symbol occurrences. These operations are widely used in information retrieval and management, being the base of several data structures and algorithms for text collections, graphs, trees, etc. The resulting structure is faster than hashing for many typical search problems, and supports a broader range of useful problems and operations. There for we implement a path index based on those data structures that shown to be highly efficient when dealing with digital collection consist in structured documents. We describe how the index architecture works and we compare the searching algorithms with others, and finally experiments show the outperforms with earlier approaches

    Parsing Large XES Files for Discovering Process Models: A Big Data Problem

    Get PDF
    Process mining is a group of techniques for retrieving de-facto models using system traces. Discovering algorithms can obtain mathematical models exploiting the information contained into list of events of activities. Completeness of the traces is relevant for the accuracy of the final results. Noiseless traces appear as an ideal scenario. The performance of the algorithms is significant reduce if the log files are not processed efficiently. XES is a logical model for process logs stored in data centric xml files. In real processes the sizes of the logs increase exponentially. Parsing XES files is presented as a big data problem in real scenarios with dense traces. Lazy parsers and DOM models are not enough appropriate in scenarios with large volumes of data. We discuss this problematic and how to use indexing techniques for retrieving useful information for process mining. An XES compression schema is also discussed for reducing the index construction time

    Arquitectura de búsqueda para repositorios de objetos de aprendizaje

    No full text
    Con el surgimiento de los Repositorios de Objetos de Aprendizaje, y para un uso más eficiente de los mismos, se han desarrollado diferentes motores de búsqueda que permitan a los usuarios encontrar fácil y efectivamente los materiales que necesiten. Existen muchos motores que son basados en palabras claves, pero estos limitan el área de búsqueda, algunos basados en ontologías y otros como el que se utilizó, basado en los metadatos que describen a los objetos de aprendizaje. Un motor muy potente es SOLR, éste se basa en la biblioteca LUCENE, el cual se ejecuta como una aplicación Web en Java, permitiendo que se puedan enviar los documentos a indexar vía HTTP y consultarlos utilizando peticiones HTTP GET. Los experimentos que se realizaron con la herramienta implementada basada en este motor, demuestran la eficacia y la calidad de los resultados con respecto a los demás motores.With the emergence of Repositories of Learning Objects, and for a more efficient use of them, different search engines have been developed that allow users to easily and effectively find the materials they need. There are many engines that are based on keywords, but these limit the search area, some based on ontologies and others like the one used, based on the metadata that describe learning objects. A very powerful engine is SOLR, this is based on the LUCENE library, which runs as a Web application in Java, allowing documents to be sent to be indexed via HTTP and consulted using HTTP GET requests. The experiments carried out with the tool implemented based on this engine demonstrate the efficiency and quality of the results with respect to the other engines

    Optimized Indexes for Data Structured Retrieval

    No full text
    The aim of this work is to show the novel index structure based suffix array and ternary search tree with rank and select succinct data structure. Suffix arrays were originally developed to reduce memory consumption compared to a suffix tree and ternary search tree combine the time efficiency of digital tries with the space efficiency of binary search trees. Rank of a symbol at a given position equals the number of times the symbol appears in the corresponding prefix of the sequence. Select is the inverse, retrieving the positions of the symbol occurrences. These operations are widely used in information retrieval and management, being the base of several data structures and algorithms for text collections, graphs, trees, etc. The resulting structure is faster than hashing for many typical search problems, and supports a broader range of useful problems and operations. There for we implement a path index based on those data structures that shown to be highly efficient when dealing with digital collection consist in structured documents. We describe how the index architecture works and we compare the searching algorithms with others, and finally experiments show the outperforms with earlier approaches
    corecore